Goto

Collaborating Authors

 outlier detection



Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

Neural Information Processing Systems

In this paper we are concerned with investigating theperformance ofNN-based methods foranomaly detection. We firstshowthrough extensivesimulations thatNNmethods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a setofbenchmark syntheticdatasets.







Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials

Lam, Terry C. W., O'Neill, Niamh, Schran, Christoph, Schaaf, Lars L.

arXiv.org Machine Learning

The accuracy of machine learning interatomic potentials suffers from reference data that contains numerical noise. Often originating from unconverged or inconsistent electronic-structure calculations, this noise is challenging to identify. Existing mitigation strategies such as manual filtering or iterative refinement of outliers, require either substantial expert effort or multiple expensive retraining cycles, making them difficult to scale to large datasets. Here, we introduce an on-the-fly outlier detection scheme that automatically down-weights noisy samples, without requiring additional reference calculations. By tracking the loss distribution via an exponential moving average, this unsupervised method identifies outliers throughout a single training run. We show that this approach prevents overfitting and matches the performance of iterative refinement baselines with significantly reduced overhead. The method's effectiveness is demonstrated by recovering accurate physical observables for liquid water from unconverged reference data, including diffusion coefficients. Furthermore, we validate its scalability by training a foundation model for organic chemistry on the SPICE dataset, where it reduces energy errors by a factor of three. This framework provides a simple, automated solution for training robust models on imperfect datasets across dataset sizes.


Further Analysis of Outlier Detection with Deep Generative Models Ziyu Wang 1,2

Neural Information Processing Systems

The recent, counter-intuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and high-density region may not conincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihood-based outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of low-level texture versus high-level semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed.


AutomaticUnsupervisedOutlierModelSelection

Neural Information Processing Systems

Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)?